AD 


Award  Number:  W81XWH- 04 -1-0081 


TITLE:  Molecular  Aspects  of  Muscle  Damage  and  Denervation  with 
Public  Access  Tools 


PRINCIPAL  INVESTIGATOR:  Eric  P.  Hoffman,  Ph . D . 


CONTRACTING  ORGANIZATION:  Children's  National  Medical  Center 

Washington,  DC  20010-2910 


REPORT  DATE:  December  2004 


TYPE  OF  REPORT:  Annual 


PREPARED  FOR:  U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Maryland  21702-5012 


DISTRIBUTION  STATEMENT:  Approved  for  Public  Release; 

Distribution  Unlimited 


The  views,  opinions  and/or  findings  contained  in  this  report  are 
those  of  the  author (s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision  unless  so 
designated  by  other  documentation. 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  074-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  exisfing  data  sources,  gathering  and  maintaining 
the  data  needed,  and  completing  and  reviewing  this  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggestions  for 
reducing  this  burden  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports.  1215  Jefferson  Davis  Highway.  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of 
Management  and  Budget,  Paperworli  Reduclon  Project  (0704-0188),  Washington,  DC  20503 


7.  AGENCY  USE  ONL  Y  2.  REPORT  DA  TE  3.  REPORT  TYPE  AND  DA  TES  COVERED 

(Leave  blank)  December  2004  Annual  (1  Dec  2003  -  30  Nov  2004) 


4.  TITLE  AND  SUBTITLE 

Molecular  Aspects  of  Muscle  Damage  and  Denervation  with 
Public  Access  Tools 

5.  FUNDING  NUMBERS 

W81XWH-04-1-0081 

6.  AUTHOR(S) 

Eric  P.  Hoffman,  Ph.D. 

7.  PERFORMING  ORGANIZA  TION  NAME/S)  AND  ADDRESS(ES) 

Children's  National  Medical  Center 

Washington,  DC  20010-2910 

8.  PERFORMING  ORGANIZA  TION 

REPORT  NUMBER 

E-Mail:  ehof  f  manScnmcresearch .  org 

9.  SPONSORING  /  MONITORING 

AGENCY  NAME(S)  AND  ADDRESS(ES) 

U.S.  Army  Medical  Research  and  Materiel  Command 

Fort  Detrick,  Maryland  21702-5012 

10.  SPONSORING  /  MONITORING 

A  GENC  Y  REPORT  NUMBER 

7  7.  SUPPLEMENTARY  NOTES 

12a.  DISTRIBUTION  /  AVAILABILITY  STA  TEMENT 

Approved  for  Public  Release;  Distribution  Unlimited 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  ( Maximum  200  Words) 

The  over-riding  hypothesis  of  this  proposal  is  that  muscle  tissue,  and  its  major  cell  type  (the  myofiber)  is  an  ideal  platform  on  which  to 
test  the  power  of  post-genomic  research,  including  integration  of  DNA,  mRNA,  and  protein  data.  This  tissue  is  also  of  considerable 
importance  to  the  military  (muscle  function  of  personnel),  and  to  their  families  (muscular  dystrophy  is  among  the  most  common  of  the 
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INTRODUCTION:  Narrative  that  briefly  (one  paragraph)  describes  the  subject,  purpose  and  scope  of 
the  research. 

BODY:  Following  is  a  re-statement  of  hypotheses  and  tasks,  and  an  update  on  each  for  the  previous  12 
months. 

Hypothesis:  Muscle  tissue,  and  its  major  cell  type  (the  myofiber)  is  an  ideal  platform  on  which  to  test  the 
power  of  post-genomic  research,  including  integration  of  DNA,  mRNA,  and  protein  data. 

Task  1.  Create  a  public  access  data  warehouse  for  muscle  with  quality  control  and  standard  operating 
procedures,  using  a  standardized  platform,  including  muscle  disease,  exercise  physiology,  and  plasticity 
following  muscle  damage. 

Aim  I  A.  Based  upon  our  experience  in  generating  the  largest  amount  of public  access  vertebrate 
expression  profile  data,  we  will  re-design  our  current  integrated  internal  Oracle  LIMB,  web  Oracle  data 
warehouse,  and  conversion  utilities  (NCBI  GEO)  to  reflect  changes  in  GeneChip  data  structure,  web- 
based  query  tools,  and  cross-project  comparisons  of  data.  (Year  1). 

This  Aim  has  largely  been  completed  as  originally  proposed,  and  we  have  exceeded  previous 
goals  in  some  aspects.  As  discussed  in  the  original  proposal,  we  had  previously  implemented  a  first 
generation  web  Oracle  database  at  the  http://microarrav.cnmcresearch.org  site,  and  had  published  some 
initial  user  analysis  tools  (see  Chen  et  al.  2004;  Appendix  1).  We  have  since  finished  the  complete  re¬ 
design  as  described  in  the  proposal,  and  as  detailed  more  specifically  at  our  new  web  site  database 
(http://pepr.cnmcresearch.orgl.  This  newly  implemented  database  has  accomplished  all  goals  as  set  forth 
in  the  original  proposal.  These  are  described  more  fully  below. 

Perhaps  the  best  indication  of  usefulness  of  the  public  resource  is  the  amount  of  usage  by  the 
public.  The  user  stats  are  given  in  the  summary  section.  Here,  we  show  the  very  recent  surge  in  number 
of  data  downloads  (number  of  Affymetrix  microarrays).  Note  that  the  graph  includes  only  the  first  week 
of  December,  but  all  of  November.  From  this,  it  is  evident  that  4,800  Affymetrix  microarrays  were 
downloaded  from  PEPR  in  November  2004  alone  (the  database  went  live  only  a  few  months  before,  and 
was  not  advertised  until  bugs  were  worked  out).  This  makes  it  one  of  the  most  active  public  resources  for 
microarray  data. 
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Aim  IB.  Following  re-design,  the  profde  data  warehouse  will  be  implemented  and  populated  with  50 
projects  and  1,500  QC/SOP  vertebrate  expression  profdes.  (Year  1). 

We  have  exceeded  this  aim  by  completing  more  than  200  projects  for  about  100  investigators  and 
50  institutions  (Appendix  2).  54  projects  containing  1,830  Affymetrix  profiles  (experiments)  have  been 
released  to  the  public,  typically  prior  to  publication,  via  the  http://pepr.cnmcresearch.oru  site.  The 
Appendix  also  lists  all  publications  emanating  from  these  projects  to  date. 

Aim  1C.  A  novel  data  warehouse  visualization  tool  will  be  implemented  on  the  web  site,  using  the 
TreeMap  visualization  program,  with  functional  clusters  related  to  muscle  plasticity  and  disease 
implemented  via  simple  user  interfaces.  In  this  manner,  profiles  can  be  quickly  studied for  "atrophic  ”, 
"hypertrophic”,  "dysferlin-deficient”,  and  other  gene  clusters.  This  will  provide  both  a  novel  web-based 
method  for  molecular  diagnostics,  pathway  dissection,  and  identification  of  active  transcript  units  and 
functional  clusters  in  any  muscle  having  an  "unknown  "pathogenesis.  (Years  I  and  2). 

We  have  implemented  a  series  of  tools  on  PEPR.  First,  we  have  implemented  the  time  series 
query,  as  described  in  the  first  generation  resource  (Chen  et  al.  2004;  Appendix  1).  Second,  we  have 
implemented  a  new  Chart  function  through  integration  with  NCBI  GEO  where  hierarchical  clustering  is 
available.  This  clustering  tool  is  available  for  only  a  subset  of  projects  at  this  point,  but  will  be  extended 
to  the  entire  database  in  year  2. 

A  new  interface  that  we  have  implemented,  but  not  proposed  in  the  original  application,  involves 
the  establishment  of  an  SAS  server  (see  http://sas.cnmcresearch.org).  This  site  is  not  available  to  the 
general  public  yet,  but  we  are  proud  of  the  utilities  and  advantages  of  this  new  effort.  We  have 
implemented  both  a  very  large  spinal  cord  injury  data  set  (5  time  points  after  four  types  of  injury,  with 
profiles  at,  above,  and  below  the  site  of  injury;  approximately  300  profiles  visualized),  and  a  130  biopsy 
muscular  dystrophy  dataset.  This  interface  allows  very  fast  dynamic  queries  of  biochemical  pathways, 
individual  genes  or  probe  sets,  or  lists  of  genes  (all  implemented  in  the  muscular  dystrophy  data  set;  only 
single  gene  query  in  spinal  cord  data  set).  An  effort  in  year  2  will  be  to  implement  all  public  projects 
during  year  2,  and  include  gene  ontology  and  biochemical  pathway  queries.  These  will  include  the 
“atrophic”,  “hypertrophic”  and  other  “response  clusters”,  as  noted  in  the  original  aims. 

Summary  of  Deliverables  for  Aim  1,  year  1 . 

•  PEPR  (Public  Expression  Profiling  Resource):  Completed  at  http :// pepr.cnmcresearch.org  . 

•  547  Java  Classes 

•  1 64  jsp  pages 

•  Additional  graphical  gene  query  analysis  features  (  GEO  Clustering,  log/linear,  abs/normalized, 

actual  time  point/evenly  distributed) 

•  Logging  feature 

•  L1MS/PEPR  synchronization 

•  Proposal  submission  process 

•  Remote  Affymetrix  data  submission 

Goals  for  year  2.  Enable  remote  user  uploading  of  data  into  PEPR. 

•  Web  analysis  tools:  Time  series  and  GEO  clustering  enabled  as  originally  proposed. 

Goals  for  year  2.  Implement  the  SAS  server  interface  for  20  muscle-  and  spinal-cord  injury  related 

projects,  and  integrate  into  PEPR. 
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Increased  public  access  and  data  downloads:  4,800  profiles  downloaded  in  November  2004; 
225  registered  users. 

Task  2.  Define  the  molecular  remodeling  of  the  myofiber  following  two  specific  conditions  known  to 
damage  muscle  in  humans;  an  atrophic  stimulus  (disuse  following  injury,  denervation),  and  regeneration 
following  injury.  The  injury  model  used  involves  “ compensatory ”  changes  that  prevent  further  damage 
to  the  muscle;  these  compensatory  remodeling  events  will  be  defined. 

Specific  Aim  2a.  Atrophic  stimuli  induce  a  series  of  active  ubiquitination  pathways,  and  the  protein 
targets  of  the  atrophic-specific  SCF-complex  (atrogin,  muscle  ring  protein)  can  be  identified  by  normal 
water  and  0,s  water-based  tryptic  digestions  of  control  and  atrophic  muscle,  screening  defined  fractions 
(myofibrillar,  cytosolic,  membrane)  for  ubiquitination  products.  (Year  1). 

We  have  successfully  implemented  proteomic  profiling  in  the  laboratory,  although  the  “base 
technique”  has  changed  from  O18  water-based  tryptic  digestions  to  C1  "-metabolic  labeling  comparative 
methods.  This  change  was  initiated  due  to  our  following  preliminary  results  during  the  initial  funding 
period; 


•  O18  water-based  tryptic  digestions  were  found  to  suffer  from  inefficiency  of  the  reactions,  making 
comparative  proteomics  less  quantitative  than  needed. 

•  We  hired  a  junior  faculty  member,  Yetrib  Hathout,  who  had  more  experience  and  success  with  the 
Cl3-metabolic  labeling  comparative  methods. 

For  the  atrophic  stimuli  and  measurements  of  ubiquitination  pathways,  the  switch  in  methodology 
also  required  a  switch  in  experimental  approach  from  in  vivo  to  in  vitro.  Briefly,  metabolic  labeling  is 
done  only  in  tissue  culture  conditions,  where  a  sample  with  all  lysines  and  arginines  replaced  by  a  stable 
isotope  is  compared  to  a  control  culture.  We  have  spent  the  initial  year  implementing  these  new  methods, 
and  in  year  2  will  conduct  both  2D  and  shotgun  proteomic  profiling  after  stimulation  of  the  atrophic 
pathway  by  senim  deprivation  or  glucocorticoid  administration. 

Summary  of  Deliverables  for  Aim  2,  year  1 . 


Establishment  of  proteomic  profiling  methodologies:  A  series  of  successful  proteomic 
profiling  experiments  using  metabolic  labeling  of  cultured  cells  has  been  completed,  with  a  manuscript 
submitted  for  publication  and  under  revision. 

We  have  acquired  4  gigabytes  of  shot  gun  (Finnigan  LTQ  electrospray)  data  from  one  experiment, 
and  are  beginning  to  develop  the  bio-informatics  methods  to  both  provide  automated  analyses  of  this  high 
throughput  profiling  data,  and  develop  methods  for  integration  into  our  PEPR  web  database. 

Goals  for  year  2.  Develop  bio-informatics  methods  for  automated  analyses  of  high  throughput 
shot  gun  proteomic  profiling  of  metabolic  labeling  experiments.  Conduct  an  experiment  of  glucocorticoid 
response  to  induce  the  atrogin- 1  ubiquitin  ligase  pathway,  and  identify  ubiqutinated  targets. 

Additional  complementary  grant  sources  received:  The  preliminary  data  generated  under  the 
auspices  of  this  grant  allowed  us  to  apply  for  a  national  Core  center  for  proteomics  of  premature  birth  in 
an  NIH  network.  We  have  won  this  competition.  Also,  the  issue  of  corticosteroids  and  denervation 
initiating  the  ubiquitin  ligation  pathway  has  been  expanded  in  the  context  of  FY05  DoD  funding,  and  has 
been  approved  for  funding. 
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Specific  Aim  2b.  Dystrophin-deficient  mdx  mouse  muscle  shows  normal  histopathology  until  3-4  wks  of 
age.  whereupon  large-scale  necrosis  ensures,  followed  by  effective  remodeling  commensurate  with 
decreased  sensitivity  to  dystrophin-deficiency.  We  hypothesize  that  the  proteomic  comparisons  of 
membrane  proteins  in  the  “pre-necrotic  ”  vs.  “ effectively  regenerated"  myofibers  will  permit 
identification  of  the  remodeling  that  desensitizes  the  myofiber  to  lack  of  dystrophin.  (Year  2). 

The  previous  reviewers  of  the  original  application  felt  that  this  sub  aim  was  “overly  ambitious”, 
and  CDMRP  program  personnel  requested  that  we  remove  this  sub  aim  from  the  proposal.  Thus,  no 
progress  is  reported  on  this  sub  aim. 

KEY  RESEARCH  ACCOMPLISHMENTS:  Bulleted  list  of  key  research  accomplishments  emanating 
from  this  research. 

•  PEPR  (Public  Expression  Profiling  Resource):  Completed  at  http://pepr.cnmcresearch.org  . 

o  54  public  projects 
o  1830  profiles 
o  226  registered  users 

•  Web  analysis  tools:  Time  series  and  GEO  clustering  enabled  as  originally  proposed. 

•  Increased  public  access  and  data  downloads. 

•  Establishment  of  proteomic  profiling  methodologies. 

REPORTABLE  OUTCOMES:  Provide  a  list  of  reportable  outcomes  that  have  resulted  from  this  research 
to  include: 

http  ://pepr.  cnmcresearch.org 

CONCLUSIONS:  We  have  succeeded  in  surpassing  the  originally  proposed  Aim  1,  with  the 
implementation  and  use  of  an  advanced  web  Oracle  public  access  database  of  QC/SOP  Affymetrix 
microarray  expression  profiles.  Aim  2  on  proteomic  profiling  remains  under  development,  and  is  to 
expand  in  year  2. 

REFERENCES:  none 

APPENDICES: 

Appendix  1 .  Chen  et  al.  2004 

Appendix  2.  List  of  projects  and  resulting  publications. 
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ABSTRACT 

Publicly  accessible  DNA  databases  (genome 
browsers)  are  rapidly  accelerating  post-genomic 
research  (see  http://www.genome.ucsc.edu/),  with 
integrated  genomic  DNA,  gene  structure,  EST/ 
splicing  and  cross-species  ortholog  data.  DNA  data¬ 
bases  have  relatively  low  dimensionality;  the 
genome  is  a  linear  code  that  anchors  all  associated 
data.  In  contrast,  RNA  expression  and  protein  data¬ 
bases  need  to  be  able  to  handle  very  high  dimen¬ 
sional  data,  with  time,  tissue,  cell  type  and  genes, 
as  interrelated  variables.  The  high  dimensionality  of 
microarray  expression  profile  data,  and  the  lack  of  a 
standard  experimental  platform  have  complicated 
the  development  of  web-accessible  databases  and 
analytical  tools.  We  have  designed  and  imple¬ 
mented  a  public  resource  of  expression  profile  data 
containing  1024  human,  mouse  and  rat  Affymetrix 
GeneChip  expression  profiles,  generated  in  the 
same  laboratory,  and  subject  to  the  same  quality 
and  procedural  controls  (Public  Expression 
Profiling  Resource;  PEPR).  Our  Oracle-based  PEPR 
data  warehouse  includes  a  novel  time  series  query 
analysis  tool  (SGQT),  enabling  dynamic  generation 
of  graphs  and  spreadsheets  showing  the  action  of 
any  transcript  of  interest  over  time.  In  this  report, 
we  demonstrate  the  utility  of  this  tool  using  a  27 
time  point,  in  vivo  muscle  regeneration  series.  This 
data  warehouse  and  associated  analysis  tools  pro¬ 
vides  access  to  multidimensional  microarray  data 
through  web-based  interfaces,  both  for  download  of 
all  types  of  raw  data  for  independent  analysis, 
and  also  for  straightforward  gene-based  queries. 
Planned  implementations  of  PEPR  will  include  web- 
based  remote  entry  of  projects  adhering  to  quality 
control  and  standard  operating  procedure  (QC/SOP) 


criteria,  and  automated  output  of  alternative 
probe  set  algorithms  for  each  project  (see  http:// 
microarray.cnmcresearch.org/pgadatatable.asp). 

INTRODUCTION  AND  DATABASE  DESCRIPTION 

PEPR  provides  centralized  Affymetrix  expression  profiling 
data  to  the  public  research  community,  typically  before 
publication  in  primary  research  papers.  Data  released  through 
PEPR  are  generated  within  a  single  centralized  research  group 
(Children’s  National  Medical  Center,  Microarray  Center), 
with  projects  originating  internally  and  referred  from  external 
institutions.  Currently,  1024  Affymetrix  arrays  representing 
38  projects  (13  human;  25  mouse/rat)  are  released  to  the 
public.  PEPR  is  an  Oracle-based  web  solution,  which  permits 
researchers  seamless  access  to  an  Affymetrix-only  expression 
profiling  database  through  our  web  browser  without  requiring 
their  own  Affymetrix  software.  The  web  interface  also  enables 
users  to  export  many  forms  of  data  associated  with  any 
particular  profile,  including  raw  image  files  (.dat),  processed 
image  files  (.cel)  and  interpretation  files  (,lxt).  It  allows 
researchers  to  perform  on-line  queries  of  expression  profiles 
by  any  number  of  experimental  variables  (tissue,  species,  chip 
type,  etc.).  Other  built-in  functions  include  searching  by 
GenBank  Accession  ID  and  gene  name  (gene-based  cross¬ 
profile  search).  These  search  functions  return  signal  (Avg 
Diff)  values  and  Present/Absent  Calls  (MAS5)  for  all  profiles 
in  PEPR.  We  also  designed  and  implemented  an  automated 
back-end  process  that  disseminates  all  available  PEPR  profile 
data  into  NCBI  Gene  Expression  Omnibus  (GEO)  database 
(http://www.ncbi.nih.gov/geo/)  (1).  Public  users  can  easily 
access  deposited  data  in  GEO  as  well  as  original  data  files  in 
the  PEPR  database  through  a  corresponding  link  created 
during  the  direct  deposit  process. 

To  our  knowledge,  the  PEPR  data  warehouse  is  the  largest 
such  public  resource  adhering  to  quality  control  and  standard 
operating  procedures  (QC/SOP).  However,  we  recognized  that 
the  utility  of  PEPR  is  dependent  on  some  familiarity  with 
bioinformatics  aspects  of  microarray  experiments,  where  files 
could  be  downloaded  and  analyzed  with  any  method  desired. 


*To  whom  correspondence  should  be  addressed.  Tel:  +1  202  884  6011;  Fax:  +1  202  884  6014;  Email:  ehoffman@cnmcresearch.org 
Nucleic  Acids  Research,  Vol.  32,  Database  issue  ©  Oxford  University  Press  2004;  all  rights  reserved 
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Instructions: 


cpccific  genes  of  interest  within  specific  projects. 

There  are  currently  a  I  united  rasuber  of  projects  available,  although  many  additional  projects  will  be  added  in  the  near  futiae. 
Currently  available  are: 


Moose  time  series: 


lime  pointr,  18  profiles)  fDr.  Maestro) 


•  Rat  time  lories: 

•  Spinal  cord  trauma  (5  time  pointr.  18  profiles)  (Dr.Hoffinan.  or  Eg djg) 

•  Methyl  prednisone  bolus  liver  (17  time  points;  47  profiles)  IDr.  Almon) 

•  Methyl  prednisone  bolus  muscle  (17  time  pointr.  51  profiles)  (Dr.  Almon) 

After  selecting  lie  project  desired,  a  screen  showing  available  profiles  in  that  project  will  appear.  Simply  scroll  down  and 
“select  ill*.  Then  enter  a  gene  query  (gene  name,  accession  number),  and  "Display  Results". 
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Microarray  center 


Figure  1.  Initial  database  query  for  the  time  series  query  tool. 


To  begin  to  build  true  user-friendly  web-based  data  analysis 
tools  that  do  not  require  experience  in  formatting  and 
interpretation  of  microarray  data,  we  designed  and  imple¬ 
mented  a  Single  Gene  Query  Tool  (SGQT)  (see  http:// 
microarray.cnmcresearch.org/singlegenemain.asp). 

SINGLE  GENE  QUERY  TOOL  (SGQT) 

Our  initial  implementation  of  SGQT  is  for  time  series  data, 
which  we  present  here.  We  provide  an  entry  screen  that 
defines  the  data  subset  selections  that  are  available  for  the  user 
to  search  (Fig.  1).  The  specific  projects  available  fitting  the 
search  criteria  are  then  presented,  and  selection  of  one  project 
leads  to  a  list  of  all  profiles  associated  with  the  project.  In  the 
example  we  describe  here,  a  54  profile,  27  time  point  muscle 
regeneration  series  was  selected,  with  two  different  muscles 
profiled  at  each  time  point  on  U74A  microarrays  containing 
~12  000  probe  sets  (2,3).  The  user  is  asked  to  select  the 
profiles  to  be  studied  (‘select  all’  is  the  option  used  here  to 
query  all  54  profiles).  A  web  browser-style  search  query  is 
then  evoked,  and  entry  of  any  text  or  probe  set  then  queries 
genome  databases  for  all  genes  and  probe  sets  matching  the 
query.  For  example,  entry  of  ‘myosin’  will  identify  myosin 


heavy  chains,  light  chains,  binding  proteins,  etc.  The  user  then 
selects  the  desired  gene  from  the  pull  down  result  menu. 
Query  of  ‘myogenin’  returns  only  a  single  probe  set,  which, 
when  selected  (‘submit’)  then  triggers  the  database  query  tool. 
The  tool  then  dynamically  extracts  data  from  the  .cel  files  for 
the  myogenin  probe  set  from  the  54  profile  (12  000  probe  sets/ 
profile)  data  set,  including  signal  (normalized  hybridization 
intensity),  and  absent/present  calls  (Affymetrix  MAS  5.0 
determinations).  The  tool  then  aligns  all  data  into  a  time  series, 
and  graphs  replicates  for  each  time  point  (Fig.  2),  as  well  as 
calculating  the  average  of  the  replicates,  graphing  the  average, 
and  drawing  a  graph  line  through  the  averages  for  all  time 
points  (Fig.  2).  The  tool  also  calculates  the  average  signal  for 
each  time  point,  and  the  fold-change  relative  to  time  0  (based 
upon  array-normalized  intensities)  (Fig.  2). 

The  resulting  on-line  graph  has  mouse-overs  containing 
data  associated  with  each  data  point  (time  point,  signal, 
present/absent  call),  and  for  the  arithmetic  average  (time 
point,  average  signal,  fold-change  relative  to  time  0)  (Fig.  2). 
The  mouse-over  shown  in  Figure  2  is  for  the  arithmetic 
average  of  replicates,  with  the  pop-up  window  indicating  the 
fold-change  from  time  0.  Clicking  over  any  data  point  links  to 
a  series  of  databases  (Unigene,  GenBank,  LocusLink, 
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Figure  2.  Graphic  output  ol  the  time  series  query  for  myogenin  in  muscle  regeneration.  Muscle  degeneration/regeneration  was  induced  with  intramuscular 
injection  of  cardiotoxin,  and  two  different  muscles  profiled  at  the  indicated  time  points  following  injection  [see  (2)  for  detailed  methods).  Shown  is  the 
dynamic  database  query  output  of  myogenin  probe  set  data  contained  in  54  U74A  Affymetrix  microarrays  for  27  time  points  (0-40  days).  The  green  data 
points  arc  individual  expression  profiles,  with  two  different  muscles  profiled  and  graphed  per  time  point.  The  purple  circle  is  the  average  of  the  replicates, 
with  the  graph  drawn  between  the  average  at  each  time  point.  The  y  axis  is  the  relative  expression  level  ('signal')  using  Affymetrix  MAS  5.0.  The  mouse- 
over  shown  corresponds  to  the  average  at  3.5  days,  and  provides  both  average  signal  and  fold-change  relative  to  time  0  (14.4-fold  increase  in  expression). 


Affymetrix)  containing  information  on  the  gene  of  interest,  as 
well  as  access  to  the  download  for  the  original  data  set  (.cel, 
,dat,  or  .txt  files).  The  tool  also  writes  a  dynamically  generated 
spreadsheet  containing  all  the  information  in  the  graph  and 
this  appears  as  a  link  above  the  graph.  This  spreadsheet  can  be 
downloaded,  and  analyzed  using  any  desired  graphics  or 
statistical  package.  It  should  be  emphasized  that  all  visualiza¬ 
tions  and  spreadsheets  are  dynamic  queries  of  the  web  Oracle 
database.  The  dynamic  search  and  output  of  the  54  profile 
murine  regeneration  series  shown  here  is  typically  completed 
in  approximately  15  s. 

The  five  time  series  currently  implemented  for  the  tool  are  a 
murine  in  vivo  27  time  point  muscle  regeneration  series  (54 
U74A  profiles)  (2,3),  an  8  time  point  murine  lung  calorie 
restriction  time  series  (18  U74A  profiles)  (4)  (D.Massaro  and 
L.B.Clerch,  unpublished  data),  a  5  time  point  rat  spinal  cord 
damage  series  (18  U34A  profiles)  (5),  and  two  17  time  point 
mcthylprednisone  bolus  time  series  in  rat  (47  profiles  in  liver 


and  51  profiles  in  muscle)  (6,7).  It  is  important  to  note  that 
many  experimental  variables,  such  as  diurnal  variation  in  gene 
expression,  should  be  considered  when  interpreting  lime  series 
data;  for  example,  in  the  Massaro  and  Clerch  calorie 
restriction  studies,  non-restricted  and  caloric-restricted  mice 
were  killed  at  the  same  time.  We  will  continue  to  add 
additional  time  series  to  the  tool,  and  plan  to  implement  a 
collection  of  time  series  and  non-time  series  data  comparisons 
and  visualizations  to  the  PEPR  resource. 

To  our  knowledge,  the  time  series  query  tool  described  here 
is  the  first  expression  profile  data  analysis  tool  that  requires  no 
prior  knowledge  of  microarray  data  format  or  data  interpret¬ 
ation.  This  tool  is  useful  due  to  the  quality  control  and 
replicates  available  for  each  time  point,  and  simple  visualiz¬ 
ation,  interpretation  and  download  of  these.  Future 
implementations  of  our  data  warehouse  will  allow  input  of 
externally  generated  data  that  conform  to  minimum  experi¬ 
mental  design  criteria,  and  our  QC/SOP  benchmarks  (see 
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http://microaiTay.cnmcresearch.org/pgaoutline-qcofsamples. 
asp)  via  a  web  interface  with  automated  QC/SOP  checks.  As 
PEPR  is  built  upon  a  standardized  platform  of  Affymetrix- 
only  data  adhering  to  QC/SOP,  all  internally-  and  externally- 
generated  data  within  PEPR  should  be  intrinsically  compar¬ 
able.  A  new  implementation  of  PEPR  including  many  projects 
able  to  be  queried  by  the  SGQT  tool  described  here  is  expected 
in  late  2003.  The  updated  PEPR  will  also  include  a  choice  of 
probe  set  algorithm  for  data  display  (MAS  5.0,  dCHIP,  RMA 
and  ProbeProfiler). 


MATERIALS  AND  METHODS 
Expression  profiling 

All  expression  profiles  were  generated  using  total  RNA,  with 
in  vitro  transcription  yielding  biotinylated  cRNA  for  hybridiz¬ 
ation  to  Affymetrix  GeneChips  (see  http://microarray. 
cnmcresearch.org/pgaoutlinc-qcofsamples.asp).  Only  one  of 
the  38  projects  utilized  two-round  amplifications  from  limiting 
sample  (8),  and  this  is  clearly  indicated  in  the  mouse-over 
for  that  project  (see  http://microarray.cnmcresearch.org/ 
pgadatatable.asp). 

Data  analysis 

Wc  provide  .dat,  .cel,  and  .txt  interpretation  files  using 
Affymetrix  MAS  5.0  for  all  microarrays  and  projects.  Other 
methods  of  normalization  and  probe  set  interpretation  can  be 
used  by  downloading  any  desired  file  types.  The  single  gene 
query  tool  uses  raw  ,ccl  file  data,  normalized  via  a  common 
target  intensity  between  all  profiles  in  the  project,  and  provides 
information  on  ‘present/absent’  call  determinations,  but  does 
not  use  these  for  data  analysis  purposes.  Wc  have  recently 
shown  that  the  Affymetrix  MAS  5.0  probe  set  interpretation 
method  provides  good  signal/noise  ratios  for  expression 
profiling  projects  using  tissue  samples  (9). 
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